IceNLP: a natural language processing toolkit for icelandic
نویسندگان
چکیده
Icelandic is a morphologically complex language, for which language technology resources are scarce. Only a few years ago, it could be stated that language technology was practically non-existent in Iceland. In this paper, we describe the development of an NLP toolkit for processing the language, the challenges faced and the decisions made during development. The current version of the toolkit consists of a tokeniser/sentence segmentiser, a morphological analyser, a linguistic rule-based tagger, and a finite-state parser. The development of our toolkit is a step towards building a Basic Language Resource Toolkit (BLARK) for the Icelandic language.
منابع مشابه
Apertium-IceNLP: A rule-based Icelandic to English machine translation system
We describe the development of a prototype of an open source rule-based Icelandic→English MT system, based on the Apertium MT framework and IceNLP, a natural language processing toolkit for Icelandic. Our system, Apertium-IceNLP, is the first system in which the whole morphological and tagging component of Apertium is replaced by modules from an external system. Evaluation shows that the word e...
متن کاملParse Trees of Arabic Sentences Using the Natural Language Toolkit
We develop a framework for using the Natural Language Toolkit (NLTK) to parse Quranic Arabic sentences. This framework supports the construction of a treebank for the Holy Quran. The proposed model succeeds in parsing different Quranic chapters (Suras) in addition to Modern Standard Arabic (MSA) sentences. The availability of such parser will be useful in various natural language processing app...
متن کاملNLTK: The Natural Language Toolkit
The Natural Language Toolkit is a suite of program modules, data sets, tutorials and exercises, covering symbolic and statistical natural language processing. NLTK is written in Python and distributed under the GPL open source license. Over the past three years, NLTK has become popular in teaching and research. We describe the toolkit and report on its current state of development.
متن کاملLangutils: A Natural Language Toolkit for Common Lisp
In recent years, Natural Language Processing (NLP) has emerged as an important capability in many applications and areas of research. Natural language can be both the domain of application and an important component in the human-computer interface. This paper describes the design and implementation of "langutils,” a highperformance natural language toolkit for Common Lisp. We introduce the tech...
متن کاملMMFeat: A Toolkit for Extracting Multi-Modal Features
Research at the intersection of language and other modalities, most notably vision, is becoming increasingly important in natural language processing. We introduce a toolkit that can be used to obtain feature representations for visual and auditory information. MMFEAT is an easy-to-use Python toolkit, which has been developed with the purpose of making non-linguistic modalities more accessible ...
متن کامل